[AURON #2236] Support compile with Celeborn 0.6 and spark 4.0 by ftong2020 · Pull Request #2243 · apache/auron

ftong2020 · 2026-05-07T03:01:58Z

Which issue does this PR close?

Rationale for this change

Spark 4.0 has been release for over 1 year, and celeborn 0.6 provides official spark 4.0 support.

This change will add spark 4.0 support when auron is compiled with spark 4.0 and celeborn 0.6.

Spark 4.1 supported is not included in this PR since latest celeborn version does not support this configuration. Spark 4.1 also changed signature of Mapstatus.apply(), which demands heavy changes to codebase.

What changes are included in this PR?

Are there any user-facing changes?

no

How was this patch tested?

Tested in our staging env

Copilot

Pull request overview

Adds Spark 4.0 compilation support when building Auron with Celeborn 0.6 by updating Spark-version-gated APIs and aligning Spark 4’s shuffle-write execution behavior with existing Spark 3.x assumptions.

Changes:

Extend @sparkver gating for getPartitionLengths() to include Spark 4.0 in Celeborn 0.6 and RSS shuffle writer implementations.
Update NativeRDD.compute() to also defer execution for RSS_SHUFFLE_WRITER plans on Spark 4+ (to avoid early execution under the refined ShuffleWriteProcessor flow).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
thirdparty/auron-celeborn-0.6/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/celeborn/AuronCelebornShuffleWriter.scala	Adds Spark 4.0 to the version-gated `getPartitionLengths()` override for Celeborn 0.6 integration.
spark-extension/src/main/scala/org/apache/spark/sql/execution/auron/shuffle/AuronRssShuffleWriterBase.scala	Adds Spark 4.0 to the version-gated `getPartitionLengths()` override for RSS shuffle writer base.
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeRDD.scala	Extends the Spark 4+ shuffle-write deferral logic to cover RSS shuffle writer plans as well.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

yew1eb · 2026-05-07T16:18:48Z

@ftong2020 Thanks for the contribution! Please update the PR title to include the issue ID: [AURON #2236]

ftong2020 · 2026-05-08T01:32:03Z

@ftong2020 Thanks for the contribution! Please update the PR title to include the issue ID: [AURON #2236]

sure

cxzl25 · 2026-05-08T13:52:09Z

Spark 4.1 supported is not included in this PR since latest celeborn version does not support this configuration. Spark 4.1 also changed signature of Mapstatus.apply(), which demands heavy changes to codebase.

The newly added parameters of MapStatus.apply have default values. Does this have any impact on supporting Spark4.1?

  def apply(
      loc: BlockManagerId,
      uncompressedSizes: Array[Long],
      mapTaskId: Long,
      checksumVal: Long = 0): MapStatus = {

ftong2020 · 2026-05-08T14:59:09Z

Spark 4.1 supported is not included in this PR since latest celeborn version does not support this configuration. Spark 4.1 also changed signature of Mapstatus.apply(), which demands heavy changes to codebase.

The newly added parameters of MapStatus.apply have default values. Does this have any impact on supporting Spark4.1?
  def apply(
      loc: BlockManagerId,
      uncompressedSizes: Array[Long],
      mapTaskId: Long,
      checksumVal: Long = 0): MapStatus = {

I tried it, it is celeborn-client itself that does not support spark 4.1. Spark will throw following exception no matter auron is enabled or not.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (4a802e9ef77a executor driver): java.lang.NoSuchMethodError: 'org.apache.spark.scheduler.MapStatus org.apache.spark.scheduler.MapStatus$.apply(org.apache.spark.storage.BlockManagerId, long[], long)'
at org.apache.spark.shuffle.celeborn.SparkUtils.createMapStatus(SparkUtils.java:87)
at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.close(HashBasedShuffleWriter.java:385)
at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.write(HashBasedShuffleWriter.java:176)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:111)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:180)
at org.apache.spark.scheduler.Task.run(Task.scala:147)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:716)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:86)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:83)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:97)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:719)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$3(DAGScheduler.scala:3122)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3122)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3114)
at scala.collection.immutable.List.foreach(List.scala:323)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3114)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1303)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1303)
at scala.Option.foreach(Option.scala:437)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1303)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3397)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3328)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3317)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:50)
Caused by: java.lang.NoSuchMethodError: 'org.apache.spark.scheduler.MapStatus org.apache.spark.scheduler.MapStatus$.apply(org.apache.spark.storage.BlockManagerId, long[], long)'
at org.apache.spark.shuffle.celeborn.SparkUtils.createMapStatus(SparkUtils.java:87)
at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.close(HashBasedShuffleWriter.java:385)
at org.apache.spark.shuffle.celeborn.HashBasedShuffleWriter.write(HashBasedShuffleWriter.java:176)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:111)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:180)
at org.apache.spark.scheduler.Task.run(Task.scala:147)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:716)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:86)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:83)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:97)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:719)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
26/05/08 14:56:25 WARN ShuffleClientImpl: Shuffle client has been shutdown!

cxzl25 · 2026-05-09T03:00:20Z

I tried it, it is celeborn-client itself that does not support spark 4.1. Spark will throw following exception no matter auron is enabled or not.

Sorry, I forgot, Celeborn needs to be released in version 0.7.0 to support Spark4.1.

https://issues.apache.org/jira/browse/CELEBORN-2239

tongfengyuan added 2 commits May 7, 2026 10:44

add support for celeborn 0.6 with spark 4.0

b16a775

fix incorrect condition comparison

eaf7ae8

github-actions Bot added spark thirdparty-celeborn labels May 7, 2026

slfan1989 requested a review from Copilot May 7, 2026 05:00

Copilot started reviewing on behalf of slfan1989 May 7, 2026 05:02 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeRDD.scala Outdated

Potential fix for pull request finding

b40ef65

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

ftong2020 changed the title ~~support compile with Celeborn 0.6 and spark 4.0~~ [AURON #2236]support compile with Celeborn 0.6 and spark 4.0 May 8, 2026

yew1eb approved these changes May 8, 2026

View reviewed changes

cxzl25 changed the title ~~[AURON #2236]support compile with Celeborn 0.6 and spark 4.0~~ [AURON #2236] Support compile with Celeborn 0.6 and spark 4.0 May 8, 2026

cxzl25 approved these changes May 9, 2026

View reviewed changes

cxzl25 merged commit 6065c55 into apache:master May 9, 2026
123 of 125 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AURON #2236] Support compile with Celeborn 0.6 and spark 4.0#2243

[AURON #2236] Support compile with Celeborn 0.6 and spark 4.0#2243
cxzl25 merged 3 commits intoapache:masterfrom
ftong2020:celeborn-0-6-with-spark-4-0

ftong2020 commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

yew1eb commented May 7, 2026

Uh oh!

ftong2020 commented May 8, 2026

Uh oh!

cxzl25 commented May 8, 2026

Uh oh!

ftong2020 commented May 8, 2026

Uh oh!

cxzl25 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ftong2020 commented May 7, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

yew1eb commented May 7, 2026

Uh oh!

ftong2020 commented May 8, 2026

Uh oh!

cxzl25 commented May 8, 2026

Uh oh!

ftong2020 commented May 8, 2026

Uh oh!

cxzl25 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants